-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dataset timestamps #190
Conversation
0ecae5f
to
054f5f2
Compare
I noticed that two methods aren’t called anywhere (neither in servicelayer nor in Aleph). I was trying to add test coverage to ensure my changes do not break their expected behavior, but now I’m wondering whether these methods can be removed?
|
2b45674
to
216acbf
Compare
@@ -275,9 +248,8 @@ def checkout_task(self, task_id, stage): | |||
|
|||
pipe.srem(self.pending_key, task_id) | |||
pipe.sadd(self.running_key, task_id) | |||
pipe.set(self.start_key, pack_now()) | |||
pipe.set(self.start_key, pack_now(), nx=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not entirely sure setting the "start_time" timestamp here is necessary in the first place as a worker shouldn’t be able to check out a task without add_task
having been executed before. (The same probably also applies for pipe.sadd(self.key, self.name)
.)
Didn’t change it because the primary goal of this PR is to fix the timestamps and not to refactor, but I’d still be interested to understand if there’s a legit case where this is relevant or if it’s merely as a precaution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like some sort of defensive programming approach. Which I can't say I dislike :)
@stchris @catileptic This isn’t 100% done, but I added two comments with questions and would like to know whether you agree with the approach (primarily removing |
67a63d2
to
c3372dd
Compare
I’m still a little unsure about the two questions above, but apart from that, this is ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, @tillprochaska ! I also appreciate the proper cleanup and the concise test cases
@@ -275,9 +248,8 @@ def checkout_task(self, task_id, stage): | |||
|
|||
pipe.srem(self.pending_key, task_id) | |||
pipe.sadd(self.running_key, task_id) | |||
pipe.set(self.start_key, pack_now()) | |||
pipe.set(self.start_key, pack_now(), nx=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like some sort of defensive programming approach. Which I can't say I dislike :)
servicelayer/taskqueue.py
Outdated
@@ -364,6 +322,29 @@ def is_task_tracked(self, task: Task): | |||
|
|||
return tracked | |||
|
|||
def flush_status(self, pipe): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, extracting helped the readability of the code tremendously! 🥇
collection_id="1", | ||
) | ||
|
||
# Adding a task updates `start_time` and `last_update` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate your comments in this non-trivial test!
Thanks for the review, @stchris! Any thoughts about #190 (comment)? |
Sorry I totally forgot about these questions again @tillprochaska
This one is used by the Aleph worker in the period task impl https://github.com/alephdata/aleph/blob/7f26ac7bbf52864f048d58cd48b8d5f5784d29a5/aleph/worker.py#L141
Couldn't find any use of this either, I think this can go! |
We currently do not retain information about inactive datasets (i.e. datasets that do not have any more pending or running tasks). For this reason, the "end_time" timestamp is currently of no use, as it would never be displayed to users anyway. While there are some plans around providing more detail about the results of processed tasks as well as completed jobs, it is unclear where this data will be stored and what the implementation will look like. As it is easy enough to add this information back (< 10 LOC), I’ve removed it for now.
c3372dd
to
2b9f9ae
Compare
@stchris Thanks, I’ve rebased the PR and applied changes based on your answers.
👍 Removed it.
Ah, thanks, I overlooked that. This makes me wonder whether it is required? (I’m asking because I was trying to come up with a test case for this method, but wasn’t able to reproduce any situation where it isn’t just a no-op.) |
I didn't check the logic flow, but it seems to me like you are right. There should not be a problem with you removing this code then ™️ |
`cleanup_dataset_status` iterates over all active datasets and removes status information if it is done (i.e. there are no more pending or running tasks). It is called by the Aleph worker periodically. We already do this for individual datasets whenever a task from the dataset is done or the dataset is cancelled as a whole. That means that `cleanup_dataset_status` is redundant. Also see: #190 (comment)
I noticed the same issue reported in alephdata/aleph#3787 when working on alephdata/aleph#3788. The current implementation updates the start time whenever a new task is added or checked out, no matter whether the dataset is active or not.
This PR changes the implementation of the dataset status data stored in Redis as follows:
last_update
remains unchanged.start_time
is set to the current time only if it isn’t set yet, i.e. if the task being added is the first task added to a dataset that was inactive before.end_time
(see fd29298 for reasoning), but I’m happy to add it back if that is preferred.time-machine
package as a dev dependency for this purpose. In my experience, manually mocking dates/times in tests is quite brittle, and I’ve successfully used this package in multiple projects (including in this Aleph PR).Just a note for future reference, as we discussed this but it isn’t documented anywhere: Parts of the status tracking in Redis is probably prone to race conditions when multiple workers are running. This PR doesn't change that. In case this is an issue in practice, we might want to try to replace logic with operations that can be executed atomically.